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We present TAO, a software testing tool performing automated test and oracle generation based on 
a semantic approach. TAO entangles grammar-based test generation with automated semantics eval¬ 
uation using a denotational semantics framework. We show how TAO can be incorporated with 
the Selenium automation tool for automated web testing, and how TAO can be further extended to 
support automated delta debugging, where a failing web test script can be systematically reduced 
based on grammar-directed strategies. A real-life parking website is adopted throughout the paper to 
demonstrate the effectivity of our semantics-based web testing approach. 


1 Introduction 

As the explosive growth of web applications in the last two decades, the demand of their quality as¬ 
surance, such as requirement of reliability, usability, and security, has grown significantly. Software 
testing has been an effective approach to ensuring the quality of web applications |[T4l l4ll^. In practice, 
programmers and test engineers typically construct test cases either manually or using industrial testing 
automation tools such as Selenium ITU, Watir |2T1, and Sahi fT3l . These tools provide functionalities for 
recording and replying a sequence of the GUI events as an executable unit test script, and provide web 
drivers for visualizing testing results. However, even with the availability of these tools, web application 
testing remains difficult and time-consuming due to the following two observations. (1) Constructing 
web-based test scripts are mainly manual; therefore, obtaining sufficient test scripts with reasonable cov¬ 
erage to expose application failure is a challenging job. (2) Those practical web testing tools allow users 
to construct unit tests or test suites; however, when a failing test script, which can expose failure of the 
web application under test once executed, is generated, debugging and locating precise fault-inducing 
GUI actions remains a tedious manual activity, since no further effective functionality such as automated 
reduction of failing test case is available for the purpose of automated debugging. 

In this paper, we firstly introduce a declarative tool, named TAO, which performs automated test and 
oracle generation based on the methodology of denotational semantics lfT7l[Tn[T6]l . TAO combines our 
previous work on a grammar-based test generator |6i and a semantics-based approach for test oracle 
generation Q, using a formal framework supporting the denotational semantics. TAO takes as inputs 
a context-free grammar (CFG) and its semantic valuation functions, and produces test cases along with 
their expected behaviors in a fully automatic way. 

Secondly, we present a new automated web testing framework by integrating TAO with Selenium- 
based web testing for functional testing of web applications. Our framework incorporates grammar-based 
testing and semantics-based oracle generation into the Selenium web testing automation to generate an 
executable JUnit test suite. Selenium ifTSl is an open source, robust set of tools that supports rapid 
development of test automation for Web-based applications. The JUnit test scripts can then be run 
against modern web browsers. 
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Thirdly, we show TAO can be easily extended to support automated delta debugging. TAO utilizes a 
grammar-based test generator to derive a structured test case based on a given CFG. We show grammati¬ 
cal structures are also valuable to reducing failing test cases, yet maintaining syntax validity. Inspired by 
previous delta debugging approaches ll22l[T^ . we present multiple grammar-directed reduction strategies 
which can be applied to reduce a failing web test script automatically based on its hierarchical structure. 
On the other hand, semantics-based test oracles obtain expected semantic results from a recursive deno¬ 
tation on each grammar structure. As a test case is reduced for debugging, its expected testing behaviors 
can also be adjusted simultaneously as an instant oracle, which is critical to promote automated web 
testing in practice. 

To demonstrate the effectiveness of our semantics-based web testing approach, we show in details 
how it can be applied on testing a real-life parking calculator website at Gerald Ford International Airport. 

The rest of the paper is organized as follows. Section |2] introduces our previous work on TAO. 
Section [3] presents our web testing framework incorporating TAO with Selenium-based web testing, and 
illustrates the approach with a practical web testing example. Section |4]presents a new grammar-directed 
delta debugging approach (GDD) utilizing grammar-directed reduction strategies and semantics-based 
instant oracles. Section[5]shows experimental results on web testing and automated debugging. Section|^ 
addresses other related research work. Finally, conclusions are given in Section |7] 

2 TAO 

TAO is an integrated tool performing automated test and oracle generation based on the methodology 
of denotational semantics. It extends a grammar-based test generator 0 with a formal framework sup¬ 
porting the three components of denotational semantics, syntax, semantics domains, and the valuation 
functions from syntax to semantics. It provides users a general Java interface to define a semantic domain 
and its associated methods, which is integrated with TAO for supporting semantic evaluation. TAO takes 
as inputs a context-free grammar (CFG) and its semantic valuation functions, and produces test cases 
along with their expected behaviors in a fully automatic way. An online version of TAO is available 
at 111. 

Denotational semantics ifT/l fTTl [T6l is a formal methodology for defining language semantics, and 
has been widely used in language developmenf and practical applications. Broadly speaking, for a web- 
based application under fesf (WUT) which requires grammar-based sfrucfured inpufs, fhe specificafion 
of fhe strucfured inpufs is a formal language; for fhose fesfing scripfs (or mefhods) running fogefher wifh 
a WUT, fhe specification of fhose scripfs is a formal language. Denofafional semantics is concerned 
wifh finding mafhemafical objecfs called domains fhaf capfure fhe meaning of an inpuf senfence — fhe 
expected resulf of fhe WUT, or fhe semanfics of a fesfing scripf — fhe running behavior of fhe scripf 
ifself along wifh fhe WUT. 

Example 1 Consider a Java application, which takes an infix arithmetic expression and performs its in¬ 
teger evaluation. We use TAO to generate test inputs (arithmetic expressions) and their expected results. 

To supporf denofafional semanfics, TAO provides a general inferface for users fo define a semanfic 
domain and ifs associated operafions as a Java class, named Domaimjava. For Example [B fhe prototype 
of semanfic domain, as shown in Figure [Ba), may confain an infeger variable, which will evenfually hold 
fhe semanfic resulf, and a sef of mefhods, such as intAdd, intSub, intMul, and intDiv supporting fhe basic 
infeger arifhmefic operafions. 

FigurelBb) shows fhe inpuf file for TAO, which confains bofh CFG rules and fheir associated semanfic 
valuation funcfions. As shown in Figure (Bb), each CFG production rule is equipped wifh a LAp-like 
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list notation denoting a semantic valuation function, separated by a delimiter ‘ @ @ ’ from the CFG rule. 
A semantic valuation function, named a semantic term in this context, can be either a singleton, such 
as a variable in the associated CFG production rule or any constant, or a fully parenthesized prefix list 
notation denoting an application of a valuation function, where the leftmost item in the list (or nested 
sublist) is a semantic method defined by Domainjava. 


Semantic Domain: int 
Semantic Operations: 
intAdd: int x int int 

int Sub: int x int int 

intMul: int x int int 

intDiv: int x int int 

(a) 


(1) E 

(2) E 

(3) E 

(4) F 

(5) F 

(6) F 

(7) T 

(8) T 

(9) [N] 


F @@ F 

E + F @@ (intAdd 
E - F @@ (intSub 
T @@ T 

F * T @@ (intMul 
F / T @@ (intDiv 
[N] m [N] 

(E) m E 
1 .. 1000 


(b) 


E F) 
E F) 

F T) 
F T) 


Figure 1: (a) Semantic Domains; (b) CFG and their Valuation Functions 


Consider the rule in line (2); it means that if a test case contains a grammar structure E +F, its 
corresponding semantic value is denoted by a A-expression XE.XFfintAdd E E), where the formal 
arguments E and E are omitted due to their implication in the CFG rule itself. If the semantic term is a 
singleton (e.g., in line (1)), it simply returns the semantic result of the singleton; otherwise, it triggers an 
associated operation (e.g., intAdd{E,E)) as defined in the domain class, assuming the semantic values 
of E and E have been obtained recursively. Note that the occurrences of E and E on the right of ‘ @ @ ’ 
denote their respective semantic values, and the variable [N] is a symbolic terminal, denoted by a pair of 
squared brackets, representing a finite domain of integers, from 1 to 1000. 


2.1 Tagging Variables 

In automated test script generation, it would be ideal that runtime assertions can be automatically em¬ 
bedded into a test script, so that when a test script is invoked for software testing, the running result 
immediately indicates either success or failure of testing; otherwise, a post-processing procedure is typ¬ 
ically required to check the running result against the oracle. 

TAO provides an easy tagging mechanism for users to embed expected semantic results into a gener¬ 
ated test case. It allows users to create a tagging variable as a communication channel for passing results 
from semantics generation to test generation. A tagging variable is in a form of $[N], where N can be 
any non-negative integer. A tagging variable can be defined in front of any semantic term <SemTerm>, 
either a singleton or a fully parenthesized prefix list notation, in a form of $[N]: <SeinTerin>. 

Example 2 If we add the following two grammar rules into the beginning of the CEG in Eigure\I\ 

TD ::= E Assert @0 $[1] : E 
Assert ::= ’=’ $[1] 

where TD is the new main CEG variable deriving an arithmetic expression and its expected evaluation 
result as well. Thus, we may get a sample test case: 3 * (8 — 4) = 12, where 12 is the expected semantic 
value obtained by the tagging variable, $[1]. 
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Each tagging variable has its application scope on deriving a test case. The rule (1) for TD allows 
the tagging variable $[1] to record the value of the semantic term E, and allows any occurrences of $[1] 
to be replaced by its recorded value within the scope of deriving “E Assert” during test generation. 
TAO allows users to define multiple tagging variables in a single semantic function for both catching 
semantic results and embedding runtime assertions. Intermediate semantic values can also be recorded 
and embedded into test scripts. 


3 TAO-based Web Testing Framework 


Figure |2] presents an automated web testing framework based on our testing tool TAO and Selenium 
browser automation. The framework consists of the following main procedures, (i) A WUT is modeled 
using a methodology of denotational semantics, where CFGs are used to represent the GUI-based ex¬ 
ecution model of the WUT, semantics domains are used to describe functional behaviors of the WUT, 
and valuation functions map user interactions to expected web behaviors, (ii) TAO takes the denotational 
semantics of the WUT as an input, and automatically generates a suite of JUnit tests, supported in the 
Selenium browser automation tool. Each JUnit test contains a GUI scenario of the WUT as well as ex¬ 
pected WUT behaviors embedded, (iii) Through Selenium’s web drivers, a suite of JUnit test scripts can 
be executed to test different scenarios of the WUT. The actual running behaviors of the WUT will be 
automatically collected to compare against its expected behaviors for consistency checking, (iv) Once a 
failing test script is found, that is, when running a test script, its actual behaviors are inconsistent from 
its expected ones, TAO will invoke a grammar-directed delta debugging strategy to repeatedly reduce the 
failing test case to a minimized one for automated debugging. In this section, we will address the first 
three procedures in details, and the procedure (iv) will be explained in the following Section |4] 


Denotational Semantics 
(CFG, semantic domains, valuation functions) 





/ 

Selenium 

X 


Browser 



Automation 






^ passing test ^ 


Test Scripts & Oracles 
in Selenese 


Figure 2: Automated Web Testing Framework 


Selenium ifTSi is an open source, robust set of tools that supports rapid development of test automa¬ 
tion for Web-based applications. It provides a test domain-specific language, named Selenese, fo write 
test scripts in a number of popular programming languages, including Java. The test scripts can then be 
run against most modern web browsers through Selenium’s web drivers. 
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3.1 A Web Application under Test — A Parking Calculator 

We use a parking calculator website at Gerald Ford International Airporfl to demonstrate how our au¬ 
tomated web testing framework works practically, integrating the automation of TAO with Selenium. 
Figure [3ta) shows the parking calculator GUI on the website, where users can select a parking lot type, 
entry and exit dates and times, and press the “Calculate” button. 


Short-Term (hourly) Parking 

$2.00 first hour; $1.00 each additional 1/2 hour 
$24.00 daily maximum 


Long-Term Garage Parking 

$2.00 per hour 
$13.00 daily maximum 
$78.00 per week (7th day free) 

(b) Partial 2014 Parking Rates 
Figure 3: Parking Calculator and Rates 

Figure[3tb) shows a part of parking rates, for the short-term and long-term garage parking lots, 
adopted by the Gerald Ford International Airport in 2014. 

3.2 WUT Execution Model in Denotational Semantics 

We show how to follow the methodology of denotational semantics to specify the user-web interactions of 
the WUT execution model and their expected behaviors. To catch the semantics of web operations in the 
parking calculator, we define the following semantic domains and necessary operations in Domain.java, 

Semantic Domains: 

Price = double 
Duration, DTsf = long 
Time, Date, LotType = string 
AmPm = boolean 

Hour, Minute, Month, Day, Year = int 
Semantic Operations: 

price: LotType x Duration ^ Price 
sf Sub: DTsf x DTsf —)• Duration 
simpleFmt: Time x Date —)■ DTsf 
date: Month x Day x Year —Date 
time: Hour x Minute —> Time 
time24Fmt: AmPm x Time —> Time 

where the semantic operation time takes inputs <hour> and <minute> and returns a Time string in the 
form of “<hour>l<niinute>IOO”', time24Fmt transform the time string into a 24-hour format by con¬ 
sidering the am or pm option; date returns a Date string in the format of “<month>l<day>/<year>”-, 

'Parking Calculator Web: www.grr.org/ParkCalc.php; 

Parking Rates Web; www.grr.org/ParkingRates.php 


PARKING CALCULATOR 



'’Please do not use military time Increments in the calculator. Doing so will result in inaccurate estimates. 

I Calculate | 


(a) Parking Calculator 
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simpleFmt combines a Date string and a 24-hour Time string into a DTsf/long type using the Java 
SimpleDateFormat package; sf Sub calculates the duration in a long type from entry to exit in Simple- 
DateFormaf, and price calculates the total parking fee based on the lot type and the duration. 

As shown in FigureOa), the typical web operation sequence is to (i) choose a parking lot type: short¬ 
term, economy, surface, valet, or long-term garage; (ii) choose entry date and time; (iii) choose leaving 
date and time, and (iv) press the “Calculate” button. Such a sequence of user-web interactive operations 
can be described using a CFG as partially shown in Figure |4l where each grammar rule is followed by a 
semantic valuation function, separated by 

Operations : := Lot Duration Cal (§@ (price Lot Duration) 

Lot ::= Short I Economy I Surface I Valet I Garage 
Short ::= ’new Select(driver.findElement(By.id("Lot"))) 

.selectByVisibleText("Short-Term Pcirking");’ @@ short 
Duration ::= Entry Exit @@ (sfSub Exit Entry) 

Entry ::= EnTime EnDate @@ (simpleFmt EnTime EnDate) 

Exit ::= ExTime ExDate @@ (simpleFmt ExTime ExDate) 

EnTime ::= AmPm EnTimeInput @@ (time24Fmt AmPm EnTimeInput) 

EnDate ::= ’driver .findElement (By. idC'EntryDate") ). clear; 

driver.findElement(By.idC'EntryDate")).sendKeys("’ TDate ’");’ @0 TDate 
TDate : := [Month] ’/’ [Day] ’/’ [Year] @0 (date [Month] [Day] [Yeeir]) 

[Month] ::= 1..12 

EnTimeInput ::= ’driver.findElement(By.idC'EntryTime")).clear(); 

driver.findElement(By.idC'EntryTime")).sendKeys("’ TTime ’");’ @0 TTime 

Cal : := ’driver.findElement(By.name("Submit")) .clickO ; ’ 

Figure 4: Partial CFG and Valuation Functions 

Each CFG rule is followed by a valuation function which evaluates the expected semantics based 
on its syntactic structure by calling pre-defined semantic operations in Domainjava, such that when a 
test script, a sequence of Selenese statements, is derived in TAG, its expected behavior on the parking 
calculator is automatically evaluated as a corresponding test oracle. Note that for a unit CFG rule without 
semantic valuation functions defined, for example, the rule “Lot : : = Short”, it relays the semantic 
value of Short to its parent rule, that is, equivalent to “Lot : : = Short @@ Short”. 

Given the CFG and associated semantic valuation functions in Fig IH TAO is expected to generate 
a suite of JUnit test scripts, each of which consists of a sequence of Selenese statements to simulate a 
scenario of users’ operations on the parking calculator website. Thus, a terminal in the CFG should be a 
legal Selenese statement, which utilizes a Selenium web driver to communicate with web browsers. For 
example, consider the CFG rule for Cal in Fig. |4] It actually simulates a user’s operation clicking the 
Calculate button. 

For conciseness, we only show the CFG for the typical web operation sequence; in practice, we also 
consider the possible permutation among operations. For example. 

Operations ::= Lot Duration Cal @@ (price Lot Duration) 

Operations ::= Duration Lot Cal @@ (price Lot Duration) 

Duration ::= Entry Exit @@ (sfSub Exit Entry) 

Duration ::= Exit Entry @@ (sfSub Exit Entry) 

Furthermore, we are able to generate each test case with one or more rounds of continuous parking 
cost calculations, each of which is followed by a runtime check as shown below 


Test ;;= Round 
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Test ::= Round Test 

Round ::= Operations Fetch Assert @@ $[1] : Operations 

Fetch ::= ’actualResult = driver.findElement(By.cssSelectorC'b")).getTextO;’ 

Assert ::= ’if (! consistent (actualResult, $[1])) failO;’ 

where the variable Test denotes a complete test case, which can possibly contain multiple rounds, each 
denoted by the variable Round, of parking calculations. Additionally, the variable Operations is used 
to specify a sequence of basic operations calculating a round of parking cost. Fetch is used to specify a 
Selenese Java statement fetching the actual parking cost from the web at runtime, and Assert is derived 
to a Java statement, by calling apre-defined Java method consistent, to compare the fetched actual cost 
with the expected cost generated by TAO, and reveal the testing failure if the costs are inconsistent. Note 
that the tagging variable $[1] is used to hold the semantic value of Operations, the expected parking 
cost, and embedded into the CFG definition of Assert as a part of test script. 

To generate an executable Java JUnit test script through Selenium’s web drivers, each test script was 
then combined with a standard Selenium JUnit test header and footer to form a complete JUnit test script. 


4 Grammar-directed Delta Debugging 

When a JUnit test script fails a runtime consistency check, for example, when actual costs calculated by 
the airport online parking calculator is different from the expected test oracle, we call such a test script 
a failure-inducing test case. In this section, we show how TAO can be extended to support automated 
delta debugging to reduce failure-inducing test cases. TAO utilizes a grammar-based test generator to 
derive a structured test case. Grammatical structures are also valuable to reducing failure-inducing test 
cases to better understand the software failure. Test case reduction based on syntax is critical to make 
it sure that reduced test cases are syntactically valid; as a test case is reduced, its expected semantics 
or oracle on software testing will be changed as well, since denotational semantics typically maps a 
syntactic structure into mathematical domains. In this section, we show how our semantics-based test 
oracle approach advance automated delta debugging. 


4.1 Delta Debugging 

Delta debugging (DD) has been a popular automated debugging approach to simplifying and iso¬ 
lating failure-inducing inputs for fault localization. It simplifies a failing fesf case fo a minimal one fhaf 
sfill produces fhe fesfing failure, where fhe minimizafion is defined in terms that any further desired 
simplification of the test case would make the testing succeed. DD assumes a set of changeable circum¬ 
stances and uses a general binary search within those changeable circumstances. However, identifying 
changeable circumstances in a test case often requires syntactic information so that any involved change 
will not invalidate the test case itself. Additionally, even if a set of changeable circumstances have been 
successfully identified, fheir heferogeneify may prevenf us applying a simple binary search. 

Confrasf fo DD, hierarchical della debugging (HDD) llT^ parses a lesl case info a hierarchical sfruc- 
fure of changeable circumslances based on ifs synfacfic informalion so fhaf fhe DD fechnique can be 
applied on each sfrucfural level fo mainfain synfacfic validity. However, such a hierarchical sfrucfure 
adopfed in HDD is nol a fradifionally defined parse free, bul a reorganized sfrucfure suilable for fhe 
application of DD. Consfrucfing such a hierarchical sfrucfure may need a domain-specific parser. 
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4.2 Grammar-directed Test Reduction 

TAO utilizes a given CFG to derive structured test case. Such a CFG is also valuable to reducing failure- 
inducing test cases for automated debugging, not only for the purpose of syntactic validity, but also for 
providing clues how the reduction can be automated in a systematic way. We use the following example 
to illustrate grammar-directed test reduction strategies. 

Example 3 Consider the following partial CFG for a simple structural programming language. Each 
program contains a definition part (denoted by Def ) and a sequence of statements (denoted by StmtSeqj, 
such as assignment, if or loop statements. 

Program ::= Def StmtSeq 
StmtSeq* ::= Stmt 
StmtSeq ::= Stmt StmtSeq 

Stmt ::= while Cond { StmtSeq } 

Now we present multiple grammar-directed reduction strategies as follows, which can be applied to 
reduce a test case while still following syntactic validity. 

[Reduction by Default]: TAO allows users optionally to specify a default grammar rule by simply 
marking an asterisk (*) after the defining variable in a rule. For example, “StmtSeq* : : = Stmt” is a 
default rule, which typically means that Stmt is one of the simplest yet valid structures for StmtSeq. 
The reduction strategy by default searches for each node labeled by StmtSeq in the derivation tree, and 
checks whether its child nodes can be simplified based on fhe defaulf rule. 


Program Program 



Figure 5: Reduction by Defaulf 

Assume fhaf a failing fesf program pap is found whose corresponding derivation free is shown in 
Figure|5l where p, a, and p, respecfively, represenf subfrees for a definifion parf, a single sfafemenf, and 
a sequence of sfafemenfs. Wifh fhe reducfion sfrafegy by defaulf, fhe reduced derivafion free would sfill 
be a valid one synfaclically, corresponding fo ifs reduced program pa and fhe derivation as follows: 

Program Def StmtSeq p StmtSeq p Stmt =>* pa 

[Reduction by Direct Recursion]: Similarly considering “StmtSeq : := Stmt StmtSeq”, a directly 
recursive rule, TAO can search for each occurrence of the StmtSeq node, and check whether its child 
nodes involve a recursive node. 

If so, as shown in Figure (6] TAO can reduce the original derivation into a reduced one, corresponding to 
a reduced test program pp and its valid derivation as follows: 

Program Def StmtSeq =^* p StmtSeq p Stmt =^* pp 
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Program 



p 


Figure 6: Reduction by Direct Recursion 


[Reduction by Indirect Recursion]: We often see that some CFG variables are defined in an indirectly 
recursive way. Consider the partial CFG in Example [3] The variable StmtSeq contains an indirectly 
recursive definition through: 


StmtSeq Stmt => while Cond { StmtSeq } 

Thus, we may have the reduction strategy by indirect recursion as shown in Figure |7| It reduces the 
derivation into a valid one in a similar way as the reduction strategy by direct recursion, but searches for 
alternative reduction of StmtSeq in a much deeper way within its derivation subtree. 


Program 



Cond { StmtSeQ > 



a 6 


Figure 7: Reduction by Indirect Recursion 

All three reduction strategies we have just introduced are based on the assumption that a failure- 
inducing reduced test case probably gives a better intuition to locate the faults or understand the failure 
causes in software testing. In practice, the reduction strategy by indirect recursion may compromise 
runtime efficiency because it is unknown that which variable actually has an indirect recursion and look¬ 
ing for indirect recursion over the whole derivation tree is expensive. Therefore, TAO provides users 
a declarative way to specify a list of applicable reduction strategies. Consider Example [D for reducing 
failure-inducing arithmetic expressions; users may specify a list of reduction strategies as follows: 

TAO-reduction: {"default", "directRec", "indirectRec: {E,F,T}"} 

Eor the sake of efficiency, users need to explicitly list those CEG variables which are both defined using 
indirect recursion and used for reduction purposes. 

4.3 Semantics-based Instant Oracle 

Eor automated delta debugging, grammar-directed reduction helps to maintain the syntactic validity on 
reduced test cases. As a failure-inducing test case is reduced, its expected semantics or oracle on software 
testing needs to be instantly updated as well so that further automated reduction can be continuously 
performed to minimize failure-inducing patterns for precise fault localization. 

TAO has been extended with an instant oracle mechanism for dynamic test case reduction. Each test 
case generated by TAO comes with a derivation tree, which can be further manipulated by applying any 
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( 1 ) 



intAdd 

A 


-► 

E 




F 


A 


lambda 



Figure 8: A semantic valuation function for E : := E + F 


of the reduction strategies, if applicable. A semantic tree is dynamically built up for obtaining an instant 
oracle by applying pre-defined valuation functions (e.g., as shown in Figure [B on a reduced derivation 
tree. To support instant oracles, TAO stores semantic valuation functions into a mapping set indexed by 
each CFG rule. Figure [8] shows an example of data structures for storing semantics valuation functions, 
corresponding to a partial derivation: 

£■( 1 ) ^£'( 2 )_^^( 3 )_ 

where a variable with a superscript (e.g., tells that the variable is bound with a semantic node (also 
highlighted by a double-line node) with the same label as shown in Figure [H The tagging node 0, a 
feature in TAO but irrelevant in this paper, defines a default tagging variable $[0] catching the semantic 
value. 

4.4 Grammar-directed Delta Debugging (GDD) 

We present a new delta debugging algorithm, GDD, which incorporates grammar-directed reduction 
strategies with the instant oracle generator in a search based procedure. As shown in Algorithm [T] GDD 
takes a failure-inducing test case, test, as an input, and repeatedly applies each applicable reduction 
strategy to obtain a reduced one (lines 6 — 8) until no further reduction is possible (lines 9 — 12) in the 
recursion. The sub-function GETDERlVATlONTREE(cMrre?it) (line 5) returns the root of the derivation 
tree associated with current', the sub-function APPLY (a, current) (line 7) applies the reduction strategy 
a on the derivation tree of current in a search-based way, and returns a reduced one if applicable, 
otherwise returns the same current test case. 

The function APPLY(a, test, root, pNode), defined in Algorifhm|2l applies fhe reduction sfrafegy a 
on each node pNode in fhe derivation free roofed af root in a fop-down, depfh-firsf order. For each non- 
ferminal node pNode (line 21), fhe sub-funcfion checks whefher fhe fhe reduction sfrafegy a is applicable 
on fhe subfree roofed af pNode (line 22). We highlighf fwo implemenfafion defails in fhis algorifhm. (1) 
We adopf fhe firsf-child/nexf-sibling dafa sfrucfure for represenfing derivation frees; fhus, reducing a 
subfree of pNode can be achieved by changing ifs firsf child link. (2) We use a sfore/resfore mechanism 
fo mainfain fhe original subfree of pNode (line 23). In case fhaf fhe reduced fesf case is nof failure- 
inducing, we have fo restore fhe original subfree of pNode (lines 27 — 30); ofherwise, fhe reduced one 
will be used for further reduction. The function REDUCED Y(pAode, a) applies fhe reducfion sfrafegy a 
on pNode', GETTestCase and INSTANtOracle refurn a fesf case and ifs oracle corresponding fo fhe 
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Algorithm 1 The GDD approach 

1: Input: test, a failure-inducing test case 
2: Output: a reduced failure-inducing test case 
3: function GDD(tejr) 

4: current <— test 

5: root GElDERIVATIONTREEfcurren?) 

6: for each applicable reduction strategy a do 

7: current <— APPLY(a, current, root, root) 

8: end for 

9: if {current is different from test) then 

10: return GT)0{current) 

11: else 

12: return current 

13: end if 

14: end function 


Algorithm 2 APPLY: a search-based reduction procedure 

15: Input: (1) a, a grammar-directed reduction strategy; (2) test, a failure-inducing test case; 

16: (3) root, the root of the derivation tree of tesf, (4) pNode, a node in the derivation tree 

17: Output: a reduced failure-inducing test case 
18: function APPLY(a, test, root, pNode) 

19: if (pAode is a CFG terminal) then 

20: return test 

21: else > pNode is a CFG variable 

22: if (a is applicable on pNode) then 

23: store the first child link of pNode > use first-child/next-sibling data structure 

24: REDUCES YipAode, a) 

25: reduced GETTESTCASE(roof) 

26: oracle iNSTANTORACLE(roof) 

27: if (TESTING(5f/r, reduced, oracle) fails) then 

28: test reduced > still failure-inducing 

29: else 

30: restore the first child link of pNode 

31: end if 

32: end if 

33: end if 

34: for each child node cNode of pNode do 

35: test t— APPLY(a, test, root, cNode) 

36: end for 

37: return fMf 

38: end function 


derivation tree rooted at root, respectively. The function TESTiNGCS'f/r, reduced, oracle) is invoked to 
check whether the reduced test case is still failure-inducing. Only a failure-inducing reduced test case 
will be kept for further delta debugging. In both cases, either reduced or not, the function will continue 
with applying the reduction strategy a on each child node of pNode recursively (lines 34 — 36). 

Example 4 Assume that the Java application under test, as described in Example\J\ handles arithmetic 
in a right-associative way instead of a left-associative by mistake, but it respects the precedences of 
operators. For example, given a test case “2 * (5 — 3 + 4) ”, the Java application returns a wrong result 
—4 instead of 12, due to the wrong handling order of “5 — 3+ 4”. Now we illustrate how our GDD is 
able to reduce the failure-inducing test case, given the CFG and valuation functions shown in Figure\^ 
where the CFG rules (1)(4)(7) are default rules of the variables E, F, and T, respectively. 
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T 
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Figure 9: An Example of Redueing Failure-indueing Test Cases 


Figure |9ta) shows the derivation tree for the original failure-inducing test case 2 * (5 — 3 + 4), fol¬ 
lowing the given CFG in Figure [T] By applying the reduction strategy by default, the APPLY function is 
able to reduce the derivation tree in Figure |9l)a) to a simplified yet still failure-inducing one as shown in 
Figure |9tb), due to the fact that F : : = T is a default rule for F. Further reduction by default rules from 
the derivation tree in Figure |9tb) is possible (e.g., some occurrences of E can be derived to F directly 
based on the default rule of E), however, none of these reduced one by default rules is failure-inducing. 

Similarly, from the derivation tree in Figure|9l)b), no further reduced failure-inducing test case is able 
to be generated by applying the reduction strategy by direct recursion. For example, the derivation tree 
in Figure |9tc) is a reduced one applying the reduction strategy by direct recursion of E; however, the 
reduced test case (5 — 3) is not a failure-inducing one, unable to expose the fault of left-associativity. 

By further applying the reduction strategy by indirect recursion of E, the APPLY function is able 
to reduce the derivation tree in Figure l^b) to a simplified one in Figure Sd), which corresponds fo a 
maximally-reduced yef precise failure-inducing paffern 5 — 3 + 4. The fhree grammar-directed reduction 
sfrafegies can be applied repeatedly until no more furlher reducfion can be made. 

5 Experimental Results 

In fhis section, we firsf show our preliminary experimenfal resulfs on aufomafed della debugging by 
applying Ihe GDD approach on applicafions which require slructured inpuls, and Ihen show experimenfal 
resulfs on Selenium-based web testing. 

5.1 Locating failure-inducing patterns on buggy Java programs 

Preliminary experimenls have been conducted on lesfing and debugging 5 differenl buggy Java programs 
(studenl submissions) which lake an arilhmelic expression as an inpul and perform ils integer calculalion. 
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as described in Example[T] We used the extended TAO with new capabilities, instant oracle and grammar- 
directed reduction strategies, to generate 1000 arithmetic expressions and locate the failure-inducing 
patterns. TAO takes the following inputs: a file of CFGs associated with its semantic functions H and a 
list of reduction strategies to be applicable, 

TAQ-reduction: {"default", "directRec", "indirectRec: {E,F,T}"}. 

Table [T] shows our experimental results of reduced failure-inducing patterns by applying our GDD 
approach. Consider the first Java program. For example, “88 -|- (45 * 15 -|- (15/85 * 99/88 * 27/95 — 
92-1-22) *96* 13/67/48)” is a generated failure-inducing test case; that is, given this expression input, 
the actual evaluation result returned by the Java program is different from the expected result in the 
oracle generated by TAO. Our GDD approach is able to reduce the failure-inducing input to “15/85/88”, 
which implies the simplified failure-inducing pattern //, as shown in the Table [U By collecting all the 
simplified failure-inducing patterns, we are able to speculate that the first Java program may not handle 
right-associativity properly. 


Table 1: Failure-inducing Patterns and their Causes via GDD 


programs 

failure-inducing patterns {Possible Causes) 

1 

{-+. /*, */. //. --} {right-associativity) 

2 

{()} {parenthesis not properly handled) 

3 

{/-, /+, /*, */, +/, -/, *+, //, --} 
{right-associativity and operator precedence) 

4 

-/+, -*+} {partial operator precedence ignorance) 

5 

{/-, *+, /+} {operatorprecedence ignorance) 


The following table shows average reduction ratios on the lengths of failure-inducing inputs by apply¬ 
ing the GDD approach on debugging the 5 buggy Java programs. For example, for the second program, 
buggy due to parentheses issues, the lengths of failure-inducing test inputs can be reduced by 87% on 
average. The overall average reduction ratio among 5 programs is by 81% on lengths of failure-inducing 
inputs. Automated instant oracle generation plays a key role in automating debugging, specifically in 
identifying and reducing failure-inducing inputs. 


Table 2: Average Reduction Ratio on failure-inducing Inputs 


programs 

1 

2 

3 

4 

5 

average 

reduction ratio 

80% 

87% 

80% 

79% 

79% 

81% 


5.2 Selenium-based Web Test Script Reduction 

The second experiment shows how the extended TAO can be used for Selenium-based web testing by 
incorporating semantics-based testing into Selenium web testing framework to generate an executable 
Selenese JUnit test suite and using GDD for automated debugging. We use the parking calculator web¬ 
site at Gerald Ford International Airport for the experiment, where the CFGs and valuation functions are 
partially shown in Figure |4l The experiment utilizes TAO to generate web test scripts and their associated 

^The input file is shown in Fig.[T] with the addition that CFG rules (1)(4)(7) are marked with asterisks to denote that they 
are default rules for variables E, F, and T, respectively. 
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oracles, compares actual web testing results with expected oracles, and reveals testing failure automati¬ 
cally. We collected a suite of 500 JUnit web scripts generated by TAO. Our experimental results reveals 
that the average failure ratio is about 11.24%. 

We further applied our GDD approach on reducing failing test JUnit scripts, giving a list of reduction 
strategies specified as follows: 

TAO-reduction: {"default", "directRec"}. 

We further used 200 executable Selenium-based test scripts for the experiment of automated web 
testing and debugging by applying the GDD approach, where 28 executable test scripts cause testing 
failure. Each of those failing test scripts may contain one or multiple rounds of parking cost calculations, 
and in each round of parking cost calculation, users may set entry/exit dates and times in any order and 
modify them repeatedly. 

Our GDD approach was able to reduce a failing test script to a simplified one, wifh an average 
reducfion rafio abouf 22%. We found ouf fhaf mosf failures were caused by differenf fime-boundary 
issues. For example, consider fhe shorf-ferm parking rafes, where fhe daily maximum shorf-ferm parking 
fee is $24; however, fhe web parking calculafor could display $26 if your fofal parking time is 12 hours 
and 30 minufes. We summarize fhe faulfs as follows: 


Table 3: Faulfs Summary for fhe Online Parking Calculafor 


Fot Types 

Faults 

Garage, 

1. weekly maximum was violated 

Surface, 

2. daily maximum was violated 

Economy 

3. wrong parking cost was given when the leave time is 
earlier than the entry time 

Short-term 

4. daily maximum was violated 

5. half hour price was not properly calculated 

Valet 

6. wrong parking cost was given when the leave time is 
earlier than the entry time 


Bofh aufomafed insfanf oracle generafion and grammar-direcfed delfa debugging are critical fo au- 
fomafing web fesfing and faulf localization. 

6 Other Related Works and Discussions 

[Grammar-based Test Generation ] Grammar-based test generation (GBTG) provides a systematic 
approach to producing test cases from a given context-free grammar. Unfortunately, naive GBTG is 
problematic due to the fact that exhaustive random test case production is often explosive. Prior work 
on GBTG mainly relies on explicit annotational controls, such as production seeds |[T9|| . combinatorial 
control parameters 0, and extra-grammatical annotations fT]. However, GBTG with explicit annota¬ 
tional controls is not only a burden on users, but also causes unbalanced testing coverage, often failing 
to generate many corner cases. 

TAO takes a CFG as input, requires zero annotational control from users, and produces well-distributed 
test cases in a systematic way. TAO guarantees (1) the termination of test case generation, as long as a 
proper CFG, which has no inaccessible variables and unproductive variables, is given; and (2) that every 
generated test case is structurally different as long as the given CFG is unambiguous. 
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[Model-based Web Testing] Many previous researches on automated testing of web application use 
model-based web testing, such as using finite state machines [ll, a model of application state space ifTOl . 
or an application’s event space flSl . to name a few. These approaches rely on a heuristic approach for 
generating test cases based on an application model, but generally seek extra assistance to maintain good 
testing coverage. For example, the Artemis tool O incorporates a model-based testing with a feedback- 
directed strategy for automated testing of web applications. However, these testing approaches for web 
applications, mainly focusing on test generation automation, lack a further mechanism for automating 
test oracle generation and delta debugging. 

Our semantics-based automated web testing also belongs to the category of model-based web testing, 
since grammar-based test generation typically uses CFGs to describe a structured input data model or a 
user-web interactive behavior model. 

[Automated Delta Debugging] Artzi, et al. |5] proposed a white-box testing technique, which monitors 
the execution of the WUT to record symbolic path constraints, and then uses model checking to generate 
test inputs for dynamic web applications. The resulting tool, Apollo, can further minimize the set of 
constraints which lead to the failure-inducing inputs by intersecting sets of constraints among failing- 
inducing inputs. Our GDD approach is a black-box testing technique for general users, who may not 
have the source code of the WUT, but are able to generate test scripts for testing web applications. The 
GDD approach can be used to reduce either test inputs or test scripts in a grammar-directed systematic 
way. TAO combines grammar-based test generation, semantics-base oracle generation, and grammar- 
directed delta debugging as an integrated tool. 


7 Conclusions 

We presented TAO, a testing tool performing automated test and oracle generation based on a semantics- 
based approach, and showed a new automated web testing framework by integrating TAO with Selenium- 
based web testing for web testing automation. Our framework is able to generate a suite of executable 
JUnit test scripts by utilizing grammar-based test generation and semantics-based oracle generation. 

The semantics-based web testing approach is also valuable to promote automated delta debugging, 
as it provides sufficient flexibility on supporting grammar-directed reduction strategies and semantics- 
based instant oracle generation. We extend TAO with a new grammar-directed delta debugging approach 
(GDD) for automated delta debugging. As shown in experiments, not only can TAO be used to reduce 
and locate failure-inducing input patterns for those applications which require structured inputs, but it 
can also reduce web test scripts to assist fault localization. 


8 Acknowledgements 

We want to thank the anonymous referees for their valuable comments that improved the presentation. 
This research project has been partially supported by the First Data Corporation. 


References 

[1] Anneliese A. Andrews, Jeff Offutt & Roger T. Alexander (2005): Testing Web applications by modeling with 
FSMs. Software & Systems Modeling4(3), pp. 326-345, doi:10.1007/sl0270-004-0077-7. 


74 


Semantics-based Automated Web Testing 


[2] S. Artzi, A. Kiezun, J. Dolby, F. Tip, D. Dig, A. Paradkar & M. D. Ernst (2010): Finding bugs in web 
applications using dynamic test generation and explicit-state model checking. IEEE Transactions on Software 
Engineering 36(4), pp. 474^94, doi:10.1109/TSE. 2010.31. 

[3] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Mpller & Frank Tip (2011): A Framework for Au¬ 
tomated Testing of Javascript Web Applications. In: the 33rd ICSE, ACM, pp. 571-580, doi:10.1145/ 
1985793.1985871. 

[4] Giuseppe A. Di Lucca & Anna Rita Fasolino (2006): Testing Web-based Applications: The State of the Art 
and Future Trends. Information and Software Technology 48(12), pp. 1172-1186, doi:10.1016/j .infsof . 
2006.06.006. 

[5] Hai-Feng Guo, Liang Cao, Yushu Song & Zongyan Qiu (2014): Automated Test Oracle Generation via De- 
notational Semantics. In: 14th International Conference on Quality Software (QSIC), pp. 139-144, doi:10. 
1109/QSIC.2014.38. 

[6] Hai-Feng Guo & Zongyan Qiu (2014): A dynamic stochastic model for automatic grammar-based test gen¬ 
eration. Software: Practice and Experience, doi: 10.1002/spe. 2278. 

[7] Daniel Malcolm Hoffman, David Ly-Gagnon, Paul Strooper & Hong-Yi Wang (2011): Grammar-based test 
generation with YouGen. Software Practice and Experience 41(4), pp. 427^47, doi: 10.1002/spe. 1017. 

[8] UNO LASER Lab (2014): TAO online. Available at http://laser. ist .unomaha.edu/tao_home/. 

[9] Ralf Lammel & Wolfram Schulte (2006): Controllable combinatorial coverage in grammar-based testing. 
In: International conference on Testing of Communicating Systems, pp. 19-38, doi: 10.1007/11754008_2. 

[10] A. Mesbah & A. van Deursen (2009): Invariant-based automatic testing of AJAX user interfaces. In: 31st 
Int. Conf. on Software Engineering, doi: 10.1109/ICSE. 2009.5070522. 

[11] R. Milne & C. Strachey (1976): A Theory of Programming Language Semantics. Chapman and Hall, London. 

[12] Ghassan Misherghi & Zhendong Su (2006): HDD: Hierarchical Delta Debugging. In: Proceedings of the 
28th International Conference on Software Engineering, ICSE ’06, ACM, New York, NY, USA, pp. 142-151, 
doi:10.1145/1134285.1134307. 

[13] Sahi Pro: A Web Test Automation Tool, http://sahipro.com/. 

[14] Sreedevi Sampath, Sara Sprenkle, Emily Gibson & Lori Pollock (2007): Applying concept analysis to user- 
session-based testing of web applications. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 
33( 10), pp. 643-658, doi: 10.1109/TSE. 2007.70723. 

[15] P. Saxena, D. Akhawe, S. Hanna, S. McCamant, D. Song, & F. Mao (2010): A symbolic execution framework 
for JavaScript. In: 31st IEEE Symp. on Security and Privacy, doi: 10.1109/SP. 2010.38. 

[16] David A. Schmidt (1986): Denotational Semantics: A Methodology for Language Development. Wm. C. 
Brown Publishers. 

[17] Dana Scott & Christopher Strachey (1971): Toward a mathematical semantics for computer languages. Ox¬ 
ford Programming Research Group Technical Monograph, PRG-6. 

[18] Selenium: Selenium Browser Automation, http://www.seleniumhq.org/. Accessed: 2012-08-30. 

[19] Emin Giin Sirer & Brian N. Bershad (1999): Using production grammars in software testing. In: the 2nd 
conference on Domain-specific languages, pp. 1-13, doi: 10.1145/331960.331965. 

[20] A. Stout (2001): Testing a Website: Best Practices. The Revere Group. 

[21] Watir: Web Application Testing in Ruby, http://watir.com/. 

[22] Andreas Zeller & Ralf Hildebrandt (2002): Simplifying and Isolating Failure-inducing Input. IEEE Transac¬ 
tions on Software Engineering 28(2), pp. 183-200, doi: 10.1109/32.988498. 


